Goto

Collaborating Authors

 stein variational gradient descent






8b9e7ab295e87570551db122a04c6f7c-Supplemental.pdf

Neural Information Processing Systems

Neural transport augmented sampling, firstintroduced byParnoandMarzouk (2018),isageneral method for using normalizing flows to sample from a given densityπ. Thus, samples can be generated fromπ(θ)by running MCMC chain in theZ-space and pushing these samples onto theΘ-space usingT. Neural transport augmented samplers havebeen subsequently extended by Hoffman etal. In this paper, we proposed equivariant Stein variational gradient descent algorithm for sampling fromdensities thatareinvarianttosymmetry transformations. Another contributionofourworkis subsequently using this equivariant sampling method to efficiently train equivariant energy based models forprobabilistic modeling andinference.





Stein Variational Gradient Descent With Matrix-Valued Kernels

Neural Information Processing Systems

Stein variational gradient descent (SVGD) is a particle-based inference algorithm that leverages gradient information for efficient approximate inference. In this work, we enhance SVGD by leveraging preconditioning matrices, such as the Hessian and Fisher information matrix, to incorporate geometric information into SVGD updates. We achieve this by presenting a generalization of SVGD that replaces the scalar-valued kernels in vanilla SVGD with more general matrix-valued kernels. This yields a significant extension of SVGD, and more importantly, allows us to flexibly incorporate various preconditioning matricesto accelerate the exploration in the probability landscape. Empirical results show that our method outperforms vanilla SVGD and a variety of baseline approaches over a range of real-world Bayesian inference tasks.


A Finite-Particle Convergence Rate for Stein Variational Gradient Descent

Neural Information Processing Systems

We provide the first finite-particle convergence rate for Stein variational gradient descent (SVGD), a popular algorithm for approximating a probability distribution with a collection of particles. Specifically, whenever the target distribution is sub-Gaussian with a Lipschitz score, SVGD with $n$ particles and an appropriate step size sequence drives the kernel Stein discrepancy to zero at an order ${1/}{\sqrt{\log\log n}}$ rate. We suspect that the dependence on $n$ can be improved, and we hope that our explicit, non-asymptotic proof strategy will serve as a template for future refinements.